Statistical Transliteration of Judeo-Arabic Texts

نویسنده

  • Gitit Kehat
چکیده

Judeo-Arabic is a group of Arabic-based languages used by Jews for many centuries, which like some other Jewish languages (for example,Yiddish), is written in the Hebrew alphabet. Many of the great Jewish literary works of the Middle Ages were written in Judeo-Arabic. A large quantity of additional Judeo-Arabic text has become available with the digitization of manuscripts found in the Cairo Genizah.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Transliteration of Judeo-Arabic Texts into Arabic Script

! The Judeo-Arabic languages comprise a set of dialects spoken and written by Jewish communities living in Arab countries, mainly during the middle ages. Judeo-Arabic is typically written in Hebrew letters, enriched with various diacritic marks. The Judeo-Arabic spoken and written by any particular Jewish community is similar to the Arabic dialect used by their local Muslim community. In additi...

متن کامل

Using Transliteration of Proper Names from Arabic to Latin Script to Improve English-Arabic Word Alignment

Bilingual lexicons of proper names play a vital role in machine translation and cross-language information retrieval. Word alignment approaches are generally used to construct bilingual lexicons automatically from parallel corpora. Aligning proper names is a task particularly difficult when the source and target languages of the parallel corpus do not share a same written script. We present in ...

متن کامل

Study of the impact of proper name transliteration on the performance of word alignment in French-Arabic parallel corpora (Etude de l'impact de la translittération de noms propres sur la qualité de l'alignement de mots à partir de corpus parallèles français-arabe) [in French]

Bilingual lexicons play a vital role in cross-language information retrieval and machine translation. The manual construction of these lexicons is often costly and time consuming. Word alignment techniques are generally used to construct bilingual lexicons from parallel texts. Aligning single words and nominal syntagms from parallel texts is relatively a well controlled task for languages using...

متن کامل

7-bit Meta-Transliterations for 8-bit Romanizations

[7-bit encoding, transliteration] We propose a general strategy for deriving 7-bit encodings for texts in languages which use an alphabetic non-Roman script, like Arabic, Persian, Sanskrit and many other Indic scripts, and for which there is some transliteration convention using Roman letters with additional diacritical marks. These schemes, which we will call \meta-transliterations", are based...

متن کامل

A Simple but Effective Approach to Improve Arabizi-to-English Statistical Machine Translation

A major challenge for statistical machine translation (SMT) of Arabic-to-English user-generated text is the prevalence of text written in Arabizi, or Romanized Arabic. When facing such texts, a translation system trained on conventional Arabic-English data will suffer from extremely low model coverage. In addition, Arabizi is not regulated by any official standardization and therefore highly am...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014